Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate HTML output and send it to Slack, make output files downloadable in the web UI #3

Merged
merged 62 commits into from
Jan 11, 2024

Conversation

nkaretnikov
Copy link
Collaborator

@nkaretnikov nkaretnikov commented Nov 27, 2023

Reference Issues or PRs

What does this implement/fix?

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

Documentation

Access-centered content checklist

Text styling

  • The content is written with plain language (where relevant).
  • If there are headers, they use the proper header tags (with only one level-one header: H1 or # in markdown).
  • All links describe where they link to (for example, check the Nebari website).
  • This content adheres to the Nebari style guides.

Non-text content

  • All content is represented as text (for example, images need alt text, and videos need captions or descriptive transcripts).
  • If there are emojis, there are not more than three in a row.
  • Don't use flashing GIFs or videos.
  • If the content were to be read as plain text, it still makes sense, and no information is missing.

Any other comments?

@nkaretnikov
Copy link
Collaborator Author

nkaretnikov commented Nov 27, 2023

Summary:

  • adds a send_to_slack step to scheduled and one-time workflows
  • it uses the Slack API to send HTML output to a specified Slack channel
  • added a call to jupyter nbconvert to generate HTML
  • configured via "Parameters" SLACK_TOKEN and SLACK_CHANNEL in "Notebook Jobs" in the web UI, which are accessible via envs in the code
  • see the Slack API docs on how to configure a bot to send a file to a channel -- this needs to be done first for the bot/sending functionality to work
  • this new step is integrated with update_job_status_failure, so it will be visible in the UI if it fails
  • the Slack script also has some printing and additional validation, so an exception will be raised on failure, which will cause the job to fail
  • cmd_args generation is changed because (1) two commands are now called there and (2) it's passed to /bin/sh as a string anyway, so no point in keeping that in a list
  • changed *path functions to return Path objects since that's more flexible, in case callers want to modify these paths.

Notes:

  • Since there are multiple steps, it takes time to spawn and execute them. This means that configuring "Run on a schedule" with a "Minute" interval is only enough to start the main job and send to Slack, but not to update the status. After that, the whole workflow is restarted. Use a longer interval, for example, */5 * * * * (every 5 mins).
  • This requires papermill to be part of the environment used when scheduling a job.
  • The jupyter command is available globally, so it doesn't need to be in the environment.
  • "Output formats" checkboxes in the UI (Notebook, HTML) do nothing. Both of these formats are always generated.

@nkaretnikov
Copy link
Collaborator Author

nkaretnikov commented Nov 27, 2023

Testing checklist:

  • raising an exception in send_to_slack causes the workflow to fail (visible in the UI)
  • non-zero exit code in command in send_to_slack causes the workflow to fail (visible in the UI)
  • "Run now" sends to Slack and updates the status in the UI
  • "Run on a schedule" (*/5 * * * *) sends to Slack and updates the status in the UI (all steps are executed)
  • When SLACK_TOKEN and SLACK_CHANNEL are not specified, "Run now" works and updates the status in the UI, nothing is sent to Slack
  • When SLACK_TOKEN and SLACK_CHANNEL are not specified, "Run on a schedule" (*/5 * * * *) works and updates the status in the UI, nothing is sent to Slack
  • A message is sent to Slack with a valid HTML file, which is generated when the job is run. Easy to validate via:
import datetime
datetime.datetime.now()

@dharhas
Copy link
Member

dharhas commented Nov 28, 2023

So how will this be configured? i.e. "send to slack" is not a feature that all nebari / jupyter-scheduler users will need. Also someone else might want to send it to mattermost or another rest api. Is there a way to make this a bit more generic.

@nkaretnikov
Copy link
Collaborator Author

@dharhas

So how will this be configured? i.e. "send to slack" is not a feature that all nebari / jupyter-scheduler users will need.

Currently, it'll only execute this task if you provide SLACK_TOKEN and SLACK_CHANNEL as Parameters when scheduling the notebook. If you don't provide this, nothing will be sent.

Is there a way to make this a bit more generic.

Technically, we can turn this into "specify a random shell command and I'll execute it", but I don't think it's a good design.

  • Users might run into issues with string escaping
  • This prevents us from doing API-specific checking of whether the request was successful or not.

I'd suggest we add support for additional APIs separately, on a case by case basis.

@nkaretnikov

This comment was marked as outdated.

@nkaretnikov nkaretnikov marked this pull request as ready for review December 4, 2023 02:24
@nkaretnikov

This comment was marked as outdated.

@nkaretnikov

This comment was marked as outdated.

@aktech

This comment was marked as resolved.

argo_jupyter_scheduler/executor.py Outdated Show resolved Hide resolved
argo_jupyter_scheduler/executor.py Outdated Show resolved Hide resolved
argo_jupyter_scheduler/executor.py Outdated Show resolved Hide resolved
argo_jupyter_scheduler/executor.py Outdated Show resolved Hide resolved
argo_jupyter_scheduler/executor.py Outdated Show resolved Hide resolved
argo_jupyter_scheduler/utils.py Outdated Show resolved Hide resolved
@aktech

This comment was marked as resolved.

@dharhas
Copy link
Member

dharhas commented Dec 5, 2023

@nkaretnikov lets add docs also I think we need to make sure runs are timestamped.

Are they also saved to disk as well as sent to slack? "send to slack" needs to be optional.

@nkaretnikov
Copy link
Collaborator Author

Is there an example (a screenshot maybe) of slack output in a channel or something?

Screen Shot 2023-12-05 at 14 50 37

Slack previews HTML as source code here. I think they don't render it by default for security reasons. I've looked and I'm not sure there's a way to render it. Once you download it, it's valid HTML.

README.md Show resolved Hide resolved
The actual value can also be 1969-12-31-06-00-00-PM, so remove the exact
date to avoid confusion.
@nkaretnikov nkaretnikov changed the title Generate HTML output and send it to Slack Generate HTML output and send it to Slack, make output files downloadable in the web UI Dec 9, 2023
@nkaretnikov
Copy link
Collaborator Author

Manually tested everything as of this commit (3a42dd0):

@nkaretnikov nkaretnikov requested a review from aktech December 9, 2023 18:13
@nkaretnikov
Copy link
Collaborator Author

@aktech I've tested and reviewed this. PTAL

Copy link
Member

@aktech aktech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nkaretnikov

Thanks for taking another pass at this, I am having hard time understanding the flow and reasoning for various things here even after reading some of the notes you have, maybe because it's scattered here and there and probably because of my lack of understanding of argo workflows. I see you have added usage documentation, can you please add a short working comment explaining step by step what's happening here architecturally, like e.g.:

  • We schedule a job from papermill with parameters x, y
  • Then we create a job for running the given notebook via Argo workflows...
  • Then we rename files because of reason m, n, o, etc... and that's the most apt way because of reason j, k, l
  • ....

We can then put this doc in the code itself as well, since not everyone contributing would be very familiar with Argo workflows.


try:
# Sets up logging
logger = setup_logger("rename_files")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shadows name logger from outer scope.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that's on purpose. I've just re-tested to be sure, too. The global logger name is not accessible here. Things used in these scripts need to be local to them because they'll be running as separate pods. That's why they have these local imports. And you also cannot pass arbitrary Python objects as parameters - only very basic serializable things, like strings and dicts.


except Exception as e:
msg = "Failed to rename files"
logger.info(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.exception log full traceback.

try:
# Sets up logging
logger = setup_logger("rename_files")
add_file_logger(logger, log_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these two lines are inside try block?

If there is an exception here then you'd not have the logger variable in the except block anyway.

)

failure += " || {{steps.rename-files.status}} == Failed"
successful += " && {{steps.rename-files.status}} == Succeeded"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start_time value needs to be generated within that step. But we cannot pass the start_time value directly between these steps. Instead, we use the database.

What do you mean by generated? Isn't start_time the time of start of the job and not some thing that's generated?

@nkaretnikov nkaretnikov requested a review from aktech December 18, 2023 03:17
@nkaretnikov
Copy link
Collaborator Author

@aktech PTAL. Made the changes you suggested, added more info to the internals section of README. Tested to make sure it's working and the backtraces are logged to a file.

@nkaretnikov nkaretnikov merged commit 7477f21 into nebari-dev:main Jan 11, 2024
5 checks passed
@nkaretnikov
Copy link
Collaborator Author

I went ahead and merged this since it'd be nice to have as part of the current Nebari release, see nebari-dev/nebari#2195 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants